对抗性例子的现象说明了深神经网络最基本的漏洞之一。在推出这一固有的弱点的各种技术中,对抗性训练已成为学习健壮模型的最有效策略。通常,这是通过平衡强大和自然目标来实现的。在这项工作中,我们旨在通过执行域不变的功能表示,进一步优化鲁棒和标准准确性之间的权衡。我们提出了一种新的对抗训练方法,域不变的对手学习(DIAL),该方法学习了一个既健壮又不变的功能表示形式。拨盘使用自然域及其相应的对抗域上的域对抗神经网络(DANN)的变体。在源域由自然示例组成和目标域组成的情况下,是对抗性扰动的示例,我们的方法学习了一个被限制的特征表示,以免区分自然和对抗性示例,因此可以实现更强大的表示。拨盘是一种通用和模块化技术,可以轻松地将其纳入任何对抗训练方法中。我们的实验表明,将拨号纳入对抗训练过程中可以提高鲁棒性和标准精度。
translated by 谷歌翻译
Neural Representations have recently been shown to effectively reconstruct a wide range of signals from 3D meshes and shapes to images and videos. We show that, when adapted correctly, neural representations can be used to directly represent the weights of a pre-trained convolutional neural network, resulting in a Neural Representation for Neural Networks (NeRN). Inspired by coordinate inputs of previous neural representation methods, we assign a coordinate to each convolutional kernel in our network based on its position in the architecture, and optimize a predictor network to map coordinates to their corresponding weights. Similarly to the spatial smoothness of visual scenes, we show that incorporating a smoothness constraint over the original network's weights aids NeRN towards a better reconstruction. In addition, since slight perturbations in pre-trained model weights can result in a considerable accuracy loss, we employ techniques from the field of knowledge distillation to stabilize the learning process. We demonstrate the effectiveness of NeRN in reconstructing widely used architectures on CIFAR-10, CIFAR-100, and ImageNet. Finally, we present two applications using NeRN, demonstrating the capabilities of the learned representations.
translated by 谷歌翻译
In 2019 Kerdels and Peters proposed a grid cell model (GCM) based on a Differential Growing Neural Gas (DGNG) network architecture as a computationally efficient way to model an Autoassociative Memory Cell (AMC) \cite{Kerdels_Peters_2019}. An important feature of the DGNG architecture with respect to possible applications in the field of computational neuroscience is its \textit{capacity} refering to its capability to process and uniquely distinguish input signals and therefore obtain a valid representation of the input space. This study evaluates the capacity of a two layered DGNG grid cell model on the Fashion-MNIST dataset. The focus on the study lies on the variation of layer sizes to improve the understanding of capacity properties in relation to network parameters as well as its scaling properties. Additionally, parameter discussions and a plausability check with a pixel/segment variation method are provided. It is concluded, that the DGNG model is able to obtain a meaningful and plausible representation of the input space and to cope with the complexity of the Fashion-MNIST dataset even at moderate layer sizes.
translated by 谷歌翻译
Recent work attributes progress in NLP to large language models (LMs) with increased model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs for Hebrew are both under-parameterized and under-trained compared to LMs in other languages. Additionally, previous work on pretrained Hebrew LMs focused on encoder-only models. While the encoder-only architecture is beneficial for classification tasks, it does not cater well for sub-word prediction tasks, such as Named Entity Recognition, when considering the morphologically rich nature of Hebrew. In this paper we argue that sequence-to-sequence generative architectures are more suitable for LLMs in the case of morphologically rich languages (MRLs) such as Hebrew. We demonstrate that by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a specialized, morpheme-based, separately fine-tuned decoder. Using this approach, our experiments show substantial improvements over previously published results on existing Hebrew NLP benchmarks. These results suggest that multilingual sequence-to-sequence models present a promising building block for NLP for MRLs.
translated by 谷歌翻译
Inference from large autoregressive models like Transformers is slow - decoding K tokens takes K serial runs of the model. In this work we introduce speculative decoding - an algorithm to sample from autoregressive models faster without any changes to the outputs, by computing several tokens in parallel. At the heart of our approach lie the observations that (1) hard language-modeling tasks often include easier subtasks that can be approximated well by more efficient models, and (2) using speculative execution and a novel sampling method, we can make exact decoding from the large models faster, by running them in parallel on the outputs of the approximation models, potentially generating several tokens concurrently, and without changing the distribution. Our method supports existing off-the-shelf models without retraining or architecture changes. We demonstrate it on T5-XXL and show a 2X-3X acceleration compared to the standard T5X implementation, with identical outputs.
translated by 谷歌翻译
Denoising diffusion models (DDMs) have led to staggering performance leaps in image generation, editing and restoration. However, existing DDMs use very large datasets for training. Here, we introduce a framework for training a DDM on a single image. Our method, which we coin SinDDM, learns the internal statistics of the training image by using a multi-scale diffusion process. To drive the reverse diffusion process, we use a fully-convolutional light-weight denoiser, which is conditioned on both the noise level and the scale. This architecture allows generating samples of arbitrary dimensions, in a coarse-to-fine manner. As we illustrate, SinDDM generates diverse high-quality samples, and is applicable in a wide array of tasks, including style transfer and harmonization. Furthermore, it can be easily guided by external supervision. Particularly, we demonstrate text-guided generation from a single image using a pre-trained CLIP model.
translated by 谷歌翻译
电力系统状态估计面临着不同类型的异常。这些可能包括由总测量错误或通信系统故障引起的不良数据。根据实施的状态估计方法,负载或发电的突然变化可以视为异常。此外,将电网视为网络物理系统,状态估计变得容易受到虚假数据注射攻击的影响。现有的异常分类方法无法准确对上述三种异常进行分类(区分),尤其是在歧视突然的负载变化和虚假数据注入攻击时。本文提出了一种用于检测异常存在,对异常类型进行分类并识别异常起源的新算法更改或通过错误数据注入攻击针对的状态变量。该算法结合了分析和机器学习(ML)方法。第一阶段通过组合$ \ chi^2 $检测指数来利用一种分析方法来检测异常存在。第二阶段利用ML进行异常类型的分类和其来源的识别,特别是指突然负载变化和错误数据注射攻击的歧视。提出的基于ML的方法经过训练,可以独立于网络配置,该网络配置消除了网络拓扑变化后算法的重新训练。通过在IEEE 14总线测试系统上实施拟议的算法获得的结果证明了拟议算法的准确性和有效性。
translated by 谷歌翻译
高能量密度物理(HEDP)实验通常涉及在低密度泡沫内部传播的动态波 - 前。这种效果会影响其密度,因此影响其透明度。泡沫生产中的一个常见问题是产生有缺陷的泡沫。需要有关其尺寸和同质性的准确信息来对泡沫的质量进行分类。因此,这些参数使用3D测量激光共聚焦显微镜进行表征。对于每个泡沫,拍摄五个图像:两张2D图像,代表顶部和底部泡沫平面和3D扫描的侧面横截面的三张图像。专家必须通过图像集进行手动对泡沫质量进行分类的复杂,苛刻和疲惫的工作,然后才能确定是否可以在实验中使用泡沫。目前,质量有两个二元级别的正常与缺陷。同时,通常需要专家来对正常缺陷的子类别进行分类,即有缺陷但可能需要实验的泡沫。由于不确定的判断,该子类是有问题的,这主要是直观的。在这项工作中,我们提出了一种新颖的最先进的多视图深度学习分类模型,该模型通过自动确定泡沫的质量分类并因此有助于专家来模仿物理学家的观点。我们的模型在上表面和下表面泡沫平面上达到了86 \%的精度,整个集合中达到了82 \%,这表明了该问题的有趣启发式方法。这项工作中的一个重大价值是能够回归泡沫质量而不是二进制扣除,甚至可以在视觉上解释该决定。本工作中使用的源代码以及其他相关来源可在以下网址获得:https://github.com/scientific-computing-lab-nrcn/multi-view-foams.git
translated by 谷歌翻译
本文提出了一种新的方法,可以通过蒙特卡洛树搜索来控制象征性音乐的情感。我们使用蒙特卡洛树搜索作为一种解码机制来指导语言模型学到的概率分布朝着给定的情感。在解码过程的每个步骤中,我们都会使用树木(Puct)的预测指标上的置信度来搜索分别由情绪分类器和歧视器给出的情感和质量平均值的序列。我们将语言模型用作管道的政策,并将情感分类器和歧视器的组合作为其价值功能。为了解码一段音乐中的下一个令牌,我们从搜索过程中创建的节点访问的分布中进行采样。我们使用直接从生成的样品计算的一组客观指标来评估生成样品相对于人类组成的碎片的质量。我们还进行了一项用户研究,以评估人类受试者如何看待生成的样品的质量和情感。我们将派斗与随机双目标梁搜索(SBB)和条件采样(CS)进行了比较。结果表明,在音乐质量和情感的几乎所有指标中,Puct的表现都优于SBB和CS。
translated by 谷歌翻译
我们研究了从记录的匪徒反馈中进行额外学习的增强合奏模型。为了实现这一目标,我们提出了一种新的增强算法,该算法直接优化了对政策预期奖励的估计。我们分析了该算法,并证明,只要满足“弱”的学习条件,每轮增强的经验风险会随着每一轮增强而降低(可能是指数迅速)。我们进一步展示了基础学习者如何减少标准监督学习问题。实验表明,我们的算法可以胜过仅在观察到的奖励上回归的深层外部学习和方法,从而证明了增强和选择正确的学习目标的好处。
translated by 谷歌翻译